meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

نویسندگان

  • Xu Sun
  • Xuancheng Ren
  • Shuming Ma
  • Houfeng Wang
چکیده

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction (k divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1–4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of ...

متن کامل

Applying Variants of Minimal Effort Backpropagation (meProp) on Feedforward Neural Networks

Neural network training can often be slow, with the majority of training time spent on backpropagation. In July of this year, Wang et al. (2017) devised a technique called minimal effort backpropagation (meProp), which reduces the computational cost of backpropagation through neural networks by computing only the k most influential rows of the gradient for any hidden layer weight matrix. In the...

متن کامل

Layer multiplexing FPGA implementation for deep back-propagation learning

Training of large scale neural networks, like those used nowadays in Deep Learning schemes, requires long computational times or the use of high performance computation solutions like those based on cluster computation, GPU boards, etc. As a possible alternative, in this work the Back-Propagation learning algorithm is implemented in an FPGA board using a multiplexing layer scheme, in which a si...

متن کامل

Deep Big Multilayer Perceptrons for Digit Recognition

The competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent advancement by others dates back 8 years (error rate 0.4%). Good old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark with a single MLP and 0.31% with a committee of seven MLP. All we...

متن کامل

Learning Deep ResNet Blocks Sequentially using Boosting Theory

Deep neural networks are known to be difficult to train due to the instability of back-propagation. A deep residual network (ResNet) with identity loops remedies this by stabilizing gradient computations. We prove a boosting theory for the ResNet architecture. We construct T weak module classifiers, each contains two of the T layers, such that the combined strong learner is a ResNet. Therefore,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017